An Ontology Design Pattern for Microblog Entries

نویسندگان

  • Cogan Shimizu
  • Michelle Cheatham
چکیده

Due to the exponential growth of the Internet of Things and use of Social Media Platforms, observers have an unprecedented level of detailed information available on the behavior of communities. However, due to the highly heterogeneous nature and the immense volume of the data, a composite view is difficult to generate. Such a composite view would be exceptionally useful in the realms of insider threat detection, after-action forensics, and hazardous situation detection and avoidance. The Semantic Web, via ontology modeling, offers a powerful tool for fusing the disparate data sources and formats. To this end, we have created an ontology design pattern (ODP) for the modeling of a simple microblog entry. This ODP is intended to fit within an ecosystem for fusing social media, support advanced visualization, and provide a preliminary framework for trust assessment. 1 Motivation & Scope In recent years, access to data has become increasingly trivial as Social Media Platforms and the Internet of Things (IoT) continue to grow. However, important latent or implicit information runs the risk of obfuscation simply by the sheer volume of collected data. Further, the data is presented and accessed via highly disparate vectors (e.g. microblog entries, visual media, and geotagged textual data). Thus, it is increasingly necessary to identify and develop methods for seamless fusion and visualization of information extracted from heterogeneous social media data. Such methods are especially important for obtaining an accurate and comprehensive view of a crisis theater or battlespace (e.g. formulating a “Common Operating Picture”). For these use cases, it is also important to take into account the provenance and trustworthiness of the acquired data and for any conclusions drawn from such data. To support the fusion of such heterogeneous data and the capture of its metadata, we will build an ecosystem of ontology design patterns [6]. ODPs enable sophisticated visualizations that leverage the inherent concept hierarchy, such as models displaying varying levels of granularity and interconnectedness. Figure 1, provides two examples of possible visualization methods that the microblog entry (MBE) will help support. We are currently 1 A Common Operating Picture is a single identical display of relevant operational information on materiel shared by more than one Command. This term is frequently investigating other visualizations in collaboration with domain experts from the United States Air Force. In this paper, we describe a pattern for a MBE as an entry point into developing the ecosystem. The MBE pattern is important for a number of reasons. First, microblog entries are representative of a fairly large subset of publicly available social media data. For example, Twitter the popular, public-facing microblogging platform, allows a Tweet’s payload to contain text, hyperlinks, images, or video. The entries may also be geotagged and may explicitly refer to other users. Additionally, there are many existing datasets that capture Tweets during natural disasters and humanitarian crises (e.g. CrisisLex). By definition and intent, microtext is simple; its model is relatively straightforward and requires little of the complexity that OWL brings to the table. Regardless, it is important to note that this pattern is a fundamental building block of the intended ODP ecosystem. However, due to its simplicity, it is relatively straightforward to fit with many existing patterns. Specifically, we foresee easy integration with the ModifiedHazardousSituation Design Pattern [4] and ReportingEvent [7]. As the ecosystem matures, we also foresee including existing patterns regarding maps, climate, and public infrastructure. Finally, the MBE pattern has some components that allow for interesting interaction: spatiotemporal extent and author trustworthiness. Spatiotemporal extent of information is of particular interest to the modeling community as there are still many open questions on its handling. However, it is an integral part of any sort of response or intelligence operation. In a perfect world, we could assume that any author neither seeks to mislead nor propagate lies. However, in light of recent events, as well as the ODP’s relevance to crisis and operational intelligence management, it is necessary to include a component for the trustworthiness of an author. Thus, the model for the microblog entry seeks to answer, at least, the following competency questions. Due to the strong emphasis on geospatial and temporal components of the fused data, we assume that these queries will be executed using geoSPARQL. 1. Who is the author of entry x? 2. What are all the entries authored by y? 3. What entries from time A to time B originate from region of interest C with radius D? 4. What is the trust value v for author y? 5. What is the trust value v for entry x? 6. What entries from authors with a trust value greater than v originate from a region of C with radius D? 7. What entries relate to topic T? 2 https://twitter.com 3 http://crisislex.org/ 4 Microtext is any sufficiently short parcel of information in natural language. An MBE is an instance of microtext. 5 http://www.opengeospatial.org/standards/geosparql (a) A Circle Packing visualization generated by D3. Smaller circles are related to the superimposed circle via subsumption and proximity in the same level of circle denotes a short semantic distance. (b) A standard view of geographic information: pins on a map background. This visualization can be updated in real-time and allows the user to see incoming data. Fig. 1: Both visualizations will utilize the MBE pattern at the most granular level (i.e. smallest circles and map pins). Microtext is a valuable resource in the Semantic Web Community, as evidenced by [2, 9, 10, 8]. However, to our knowledge this is the first attempt at modeling an MBE as an entity, instead of only modeling extracted information. The rest of the paper is organized as follows. Section 2 will address the design decisions in the structure of the pattern and accompanying axioms. Section 3 provides a motivating example and interaction with real data. Section 4 addresses future work and collaborations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cell Line Ontology: Redesigning the Cell Line Knowledgebase to Aid Integrative Translational Informatics

The Cell Line Ontology (CLO) is a community-based ontology in the domain of biological cell lines with a focus on permanent cell lines from culture collections. Upper ontology structures that frame the skeleton of CLO include the Basic Formal Ontology and Relation Ontology. Cell lines contained in CLO are associated with terms from other ontologies such as Cell Type Ontology, NCBI Taxonomy, and...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

What is the Conversation About?: A Topic-Model-Based Approach for Analyzing Customer Sentiments in Twitter

In Social Commerce customers evolve to be an important information source for companies. Customers use the communication platforms of Web 2.0, for example Twitter, in order to express their sentiments about products or discuss their experiences with them. These sentiments can be very important for the development of products or the enhancement of marketing strategies. The research goal is to an...

متن کامل

A Content Ontology Design Pattern for Software Metrics

This paper presents a content ontology design pattern for the representation of software metrics, in software engineering ontologies, called OOPMetrics. This content ontology design pattern is designed to ease the detection of software design flaws based on the metrics that are defined in the ontology that uses it. We also present a case study that shows how an ontology that uses this pattern m...

متن کامل

An Ontology Design Pattern of the Multidisciplinary and Complex Field of Climate Change

[email protected] Abstract This article presents the manual and collaborative construction of an ontology design pattern (a generic ontology), named OntoCLUVA, of climate change (CC) field. This pattern is built for the needs of the construction of climate change ontologies. We used this pattern for a knowledge management system (KMS) of climate change. It will allow to each module of this K...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017